An Efficient Nonlinear Regression Approach for Genome-Wide Detection of Marginal and Interacting Genetic Variations
نویسندگان
چکیده
Genome-wide association studies have revealed individual genetic variants associated with phenotypic traits such as disease risk and gene expressions. However, detecting pairwise interaction effects of genetic variants on traits still remains a challenge due to a large number of combinations of variants (∼10(11) SNP pairs in the human genome), and relatively small sample sizes (typically <10(4)). Despite recent breakthroughs in detecting interaction effects, there are still several open problems, including: (1) how to quickly process a large number of SNP pairs, (2) how to distinguish between true signals and SNPs/SNP pairs merely correlated with true signals, (3) how to detect nonlinear associations between SNP pairs and traits given small sample sizes, and (4) how to control false positives. In this article, we present a unified framework, called SPHINX, which addresses the aforementioned challenges. We first propose a piecewise linear model for interaction detection, because it is simple enough to estimate model parameters given small sample sizes but complex enough to capture nonlinear interaction effects. Then, based on the piecewise linear model, we introduce randomized group lasso under stability selection, and a screening algorithm to address the statistical and computational challenges mentioned above. In our experiments, we first demonstrate that SPHINX achieves better power than existing methods for interaction detection under false positive control. We further applied SPHINX to late-onset Alzheimer's disease dataset, and report 16 SNPs and 17 SNP pairs associated with gene traits. We also present a highly scalable implementation of our screening algorithm, which can screen ∼118 billion candidates of associations on a 60-node cluster in <5.5 hours.
منابع مشابه
O-36: Genome Haplotyping and Detection of Meiotic Homologous Recombination Sites in Single Cells, A Generic Method for Preimplantation Genetic Diagnosis
Background: Haplotyping is invaluable not only to identify genetic variants underlying a disease or trait, but also to study evolution and population history as well as meiotic and mitotic recombination processes. Current genome-wide haplotyping methods rely on genomic DNA that is extracted from a large number of cells. Thus far random allele drop out and preferential amplification artifacts of...
متن کاملI-45: FISH and Array CGH for PGD of Cancer
We developed several FISH approaches to enable preimplantation genetic diagnosis of cancer predisposition syndromes. An overview of the applications and the results of those PGDs will be provided. In addition we developed several novel tools to genome wide screen for CNVs and SNPs in single cells. Those technologies are now being applied for polar body, blastomere and blastocyst screening for c...
متن کاملA novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging
MOTIVATION For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally ...
متن کاملTrees Assembling Mann-Whitney approach for detecting genome-wide joint association among low-marginal-effect loci.
Common complex diseases are likely influenced by the interplay of hundreds, or even thousands, of genetic variants. Converging evidence shows that genetic variants with low marginal effects (LMEs) play an important role in disease development. Despite their potential significance, discovering LME genetic variants and assessing their joint association on high-dimensional data (e.g., genome-wide ...
متن کاملUnveiling the genetic loci for a panicle developmental trait using genome-wide association study in rice
Panicle size has a high correlation with grain yield in rice. There is a bottleneck to identify the additional quantitative trait loci (QTL) for panicle size due to the conventional traits used for QTL mapping. To identify more genetic loci for panicle size, a panicle developmental trait (LNTB, the length from panicle neck-knot to the first primary branch in the rachis) related to panicle size ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 23 5 شماره
صفحات -
تاریخ انتشار 2015